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ABSTRACT 

Two questions are investigated here; What should the 
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relationships among measurements on individuals be estimated from the 
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* Documents acquired by ERIC include many informal unpublished * 

* materials not available from other sources. ERIC makes every effort * 

* to obtain the best copy available. Nevertheless, items of marginal * 

* reproducibility are often encountered and this affects the quality * 

* of the microfiche and hardcopy reproductions ERIC makes available * 

* via the ERIC Document Reproduction Service (EDRS) . EDRS is not ^ 

* responsible for the quality of the original document. Reproductions * 

* supplied by EDRS are the best that can be made from the original. * 



EKLC 



ASSESSING DIFFERENCES BETWEEN GROUPED 
AND 

INDIVIDUAL-LEVEL REGRESSION COEFFICIENTS* 



Leigh Burstein 
University of California, 
Los Angeles 



Paper presented at the annual meetings of 
the American Educational Research Association, 
San Francisco, April 1976 



us OCPARTMCMTOF HC^^^"- 
FDUCATlONft^WCLFARC 

;!^'',onV.nsytoteof 
coucatiom 

TH.S DOCUMENT H'^^.f^f^EO^rROM 
louci^^OU POSITION OR PDL.CV 



*The research reported here was partially supported by NIE research 
contract C-7A-0123. The paper was stimulated primarily by work with 
Lee Cronbach and Michael Hannan. 



The title of my presentation is somewhat misleading though it was 
more accurate last August when the proposal for this symposium was 
submitted. (In Appendix A, we have directly considered certain technical 
aspects of the problem suggested by the title.) The primary reasons for 
the shift in emphasis is that our thinking about the units of analysis 
in educational research has undergone rapid evolution and our interest in 
coming to grips with methodological problems in the identification of 
education effects is more pervasive than originally imagined. 

The evolution in thought can be traced to the expanded consideration 
of two key questions which arise simply because schools are aggregates 
of their teachers, classrooms, and pupils and classrooms are aggregates 
of the persons and processes within them. These general questions can be 
stated as: 

(1) What should the unit(s) of analysis be in investigations 
of educational effects and on what basis should the units 
be chosen? 

(2) Given data from the multiple levels in the analysis hierarchy, 
when and how can one estimate relations generated by models 
involving one set of levels of aggregation from relations 
generated by models based on a different set of levels? 

The second question may seem convoluted to the uninitiated, but in 
the simplest case, the question can be translated into 

(2') Under what conditions can relationships among measurements on 
individuals (e.g.. Pupil Achievement and Pupil SES) be 
estimated from the relationships among measurements on 
aggregates of individuals (e.g., school mean achievement and 
school mean SES)? 
We now know a great deal about the answers to question 2 for the 
simplest cases involving comparisons of models purely at two distinct 
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levels of aggregation (Burstein, 1974, 1975a, 1975b; Burstein and Knapp, 
1975; Hannan and Burstein, 1974; Ilannan, et ai, , 1975) and are beginning 
to better understand what happens when the models mix variables from 
multiple levels (Burstein, 1975a, 1976; Burstein and Knapp, 1975; Burstein 
and Smith, 1975; Hannan, Freeman and Meyer, 1976), Furthermore, through 
creative applications of the general linear model (Keesling and Wiley, 
1974; Rock, Baird, and Linn, 1972) and experimental designs (Glendening and 
Porter, 1974; Poynor, 1974), we have become more sensitive to the impact 
of correlated units inherent in hierarchically-nested school data. 

Developing Interest in Issues Concerning Data Aggregation 
Later on, I shall discuss the present state of the art in response 
to question 2 and provide an example of how work on the methodology of 
data aggregation has advanced both theoretically and substantively and 
has begun to take on a new degree of subtlety in its application. But 
first I want to provide some indication of how our thinking about and 
audience for units of analysis and data aggregation questions have 
changed in just two years. 

As part of a Division D paper session at the 1974 AERA conventtion, 
I presented a paper entitled "Issues concerning the inferences from 
grouped observations" (Burstein, 1974) which, along \rLth a joint paper 
with Mike Hannan appearing in the American Sociological Review that same 
year (Hannan and Burstein, 1974), reviewed, classified, interpreted, and 
hopefully expanded the work on data aggregation. In fact, the essentials 
of our answers to the simple cases of question 2 above were spelled out 
in these two efforts. 

The two papers included, among other things, reviews of work from 
sociologists' problems with ecological inference (the prime example 
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being Robinson (1950)) and change in units of analysis (Blalock, 1964; 
Hannan, 1971), the statisticians' concerns for measurement error (e.g., 
Mandansky (1959)), the political science treatment of missing data (e.g., 
Kline, Kent and Davis (1971)) and economists' treatments of economy of 
analysis (Cramer, 1964; Prais and Aitchinson, 1954) and confidentiality of 
data (Feige and Watts, 1972). To my knowledge prior to that time (later 
proven incorrect when Haney's (1974) enlightening paper on the unit of 
analysis in Project Follow Through was uncovered), there had been no 
discussion of any of the problems mentioned above by educational research 
methodologists with perhaps possible exception of early papers by 
Walker (1928) and Burks (1928) and an insightful note by Thomdike (1939).''' 
The response to my AERA presentation was not in the least bit overwhelming 
(no interested audience members and about 10 requests for the paper). In 
contrast, the Hannan-Burstein ASR paper, though riddled with errors in 
printing and, in retrospect, with confusing notation, continues to receive 
attention (sometimes critical) from sociologists. In summary, in 1974, 
questions about data aggregation as addressed by other social scientists 
were just about non-existent from educated researchers. 

In the summer of 1974, NIE funded two projects on methodology for 
aggregating data in educational research (Mike Hannan and I are co-principal 
investigators for one of the contracts). In my opinion this investment 
by NIE helped expand the thinking about the effects of grouping to more 
complex research situations as evidenced in presentations on applications 
and recent developments in data aggregation in educational research at 
the 1975 AERA meeting (Burstein 1975a; Hannan, Young and Nielson, 1975). 
The Hannan, et al. paper dealt with the effects of grouping in multivar- 
iate and longgitudinal models and my own paper fc-ussed firmly on units 
of analysis issues arising in the large-scale, regression-based studies 
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0 e Outcome variable (e.g., achievement) 
T « Treatment/Control 

G = Identification of Class Membership 

1 « Input Variable (e.g., entering ability) 

b jboib = Estimates of the effects of the respective explanatory 

X 4^ 

variables. 

(NOTE: T, G, and I may all be dummy variables sets distin- 
guishing among the categories of a nominal variable.) 

An example description of a model depicted in equation (1) is "the 
pupil performance in mathematics tests (0) is a function of whether he 
received drill or meaningful learning instruction (T) , the classroom in 
which he received the instruction (G) , and his entering ability as 
measured by a mathematics pretest (I)." 

The classroom-level analogue of model I is: 

(2) 0 « b^T + b2G H-b^I + e 

An example of model (2) explanation is: "c±ass mean performance (0) 
is a function of the type of instruction received, the class receiving 
the instruction and the mean performance of students on the pretest (1)". 
Note that T and G are measured the same at both the pupil and the class 
level . 

I would argue that in general, models (1) and (2) answer different 
questions. Furthermore, different modifications (e.g., inclusions/dele- 
tions of T, G, or I) lead to different questions being addressed and have, 
to thi.s point, lead to different decisions about the appropriate units of 
analysis. We explore some of these differences below. 

Standard ANOVA . In the typical experimental study, the usual model 
has been: 

(3) 0 = b^T + e (pupil-level ANOVA); 

3 « 
where b now represents the treatment effect • 
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Or, if the investigator were more sensitive to the problem of independence 
of observations, either 

(4) 0 = b^T + e, (Class-level ANOVA) 



or 



(5) 0 = b^T + b^G + e (Nested ANOVA, or Pooled Within-class ANOVA) 

The choice among (3)-(5) has been the subject of discussion dating 
back to at least Lindquist (1940), Cochran (1947), and McNemar (1940) and 
has also been considered by Peckham, et al, (1969), Glass and Stanley (1970), 
Wiley (1970) and more recently by Poynor (1974) and Glendening and Porter 
(1974). 

The weight of the evidence is that most educational research using 
intact classrooms employs model (3) though model (5) and, perhaps, model- (4) 
are superior (Glendening and Porter (1974); Poynor (1974)). Both 
models (4) and (5) takfii into account the fact that there are groups of 
individuals whose responses are correlated and are thus more attuned to 
the realities of the situation. 

My primary reason for preferring model (5) is that "independence of 
observations" is a matter of degree rather than existence in research 
and if group differences are small, more powerful tests of effects may be 
possible with the nested model. (Glendining's and /or Poynor 's presenta- 
tion may have more to say with regard to this particular model.) 

Standard ANCOVA . In the typical analysis of covariance problem, the 
usual model is 

(6) 0 »= bji^T + b^I + e 

where b^ now represents the pooled wi thin-treatment regression coefficient. 

The adjustment, b^I, is the approjriate one when the assumptions of 
parallelism of regression slopes and independence of treatments (T) and 
covariate (I) can be met. For eicampli, in aptitude-x-treatment interaction 
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rescarch, the assumptions require that the relationship of outcome 
achievement to entering aptitude be the "same" (not significantly dif- 
ferent) for each treatment group, and the treatment should be uncorrelated 
with entering aptitude. 

As Cronbach (1976; Cronbach and Webb, 1975) has pointed out, model 
(6) is highly likely to be inappropriate when intact classrooms are 
sampled (whether or not students within classrooms are randomly assigned 
to treatments). He urges that between-class and within-class analyses be 
conducted instead by examining the following models: 

(7) 0 « b^T + b^I + e (between-class) 

and 

(8) 0 = b,T + b, I + e (within-class) 
^ ' w 1 3w w 

(Note that the within-class analysis might also be done using our 
model (1).) 

The greater sensitivity implicit in Cronbach' s proposed analyses is an 
important step forward in the use of analysis of covariance with hierarchi- 
cal data. The between-class and within-class analyses do not remove the 
need for concern about homogeneity of regression; in fact, they should 
increase the investigator's wariness regarding this prpblem and add the 
need to watch for lack of independence between classrooms and covariate. 
The startling reversals that Cronbach and Webb found are perhaps warning 
enough. 

The methodology on the application of ANCOVA in non-equivalent groups 
designs is in an expansionist phase. Cronbach' s comments (1976) suggest 
that something might be gained from recognizing the parallels between the 
analysis of covariance model and the models we (Burstein, 1974, 1975a, b; 
Burstein and Knapp, 1975; Hannan, 1971; Hannan and Burstein, 1974) have 
proposed for identifying the effects of grouping (Fennessey (1968) and 
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WertB and Linn (1969, 1971) deserve the credit for first noting this 
parallel.)* The model incorporating the grouping variable dincussed in 
the next flection from regression-based studies takes the form: 
(6') 0 = b2G + b^I + e (see equation (11) below). 

Equation (6') could be viewed as (6) where group membership represents 
the treatment, or (6) can be considered to be (6') when the group member- 
ship is defined by treatment. In any case, though the relationship of 
interest is different effects of treatment (T) in ANCOVA; relationship 
to input (I) in regrerjsion — for the two models, the same phenomenon 
should sanction or invalidate their use. So persons working on either 
problem should be able to learn from the work on the other. 

Standard Regression . In the typical regression-based or correlation- 
al study, we generally find the model: 

(9) 0 = b^I + e (Individual-level) . 

Or, sometimes, despite the fact that the relationship of interest is 
one posited to exist among individuals, we find class-level analyses as 
depicted in equation (10) : 

(10) 0 = b^I + e (class-level) . 

Except under very special circumstances (e.g., groups randomly formed, 
or grouped formed on the basis of I), the appropriate model for hi(>rarchi- 
cal data or intact classrooms includes the grouping variable G in an 
individual-level analysis: 

(11) 0 = b2G + bg^J + e . 

That is, if the researcher were interested in the relationship of student 
learning as measured by an achievement test to student entering aptitude 
or to students' family background, then b^^ (the pooled within-class 
regrerssion coefficient) is deemed preferable to b^ (the individual coef- 
ficient) and certainly to b^ (the between-groups coefficients). 

10 



-9- 



The reason for preferring the wlthin-class coefficient to the 
individual coefficient is that equation (11) is more correctly specified, 
having better accounted for the factors affecting 0 than does equation 
(9). And, the one benefit from calculating the between-groups coefficient 
from model (10) is that we have evidence that equation (9) is misspecif ied, 
and at least G should be incorporated when b^ - b^ 9^ 0 (Bur stein, 1975a, b; 
Cronbach, 1976; Hannan, Young and Nielsen, 1975). 

It is worth reiterating that the nature of the question is the 
primary determiner of the appropriate choice of analysis model. We might 
have asked a question requiring b .-^^n-class analysis as described by 
model (10) (e.g.. Does the teacher's training (I) affect the amount of 
class time spent on drill activities (0) ; both variables measured at the 
class level). Or, when both outcome and input were determined prior to 
assignment to classrooms (e.g., relationship of student's family back- 
ground (I) to their entering ability *(0)), the individual-level regression 
model (Equation (9)) is preferable to either model (10) or model (11). 

The latter point about the preference . for individual-level model is 
a tricky one. Suppose that some form of tracking was employed to assign 
students to classrooms. .The result would probably be that G is correlated 
with both aptitude and background. If this were the case, then the 
classes would be internally homogeneous with respect to 0 and I and most 
of the variation would lie between classes. Under these circumstances, 
b^ is an inflated estimate of the desired relationship (b^ in this case) 
and the pooled within-class coefficient (b^^) is likely to be an under- 
estimate. 

One last point about the regression case. Most of the work by my 
colleagues and me (Bur stein, 197^», 1975a, b, 1976; Burstein and Hannan, 
1975; Burstein and Knapp, 1975; Burstein and Smith, 1975; Hannan, 1971; 

11 
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Hannan and Burstcin, 1974; Hannan, Freemnu and Meyer, 1976; Hannan, Young, 

and Nielsen, 1975) has focused on the difference between the between-groups 

regression coefficient (b^) and the individual-level regression coefficient 

(b^). Cronbach (1976; see also Cronbach and Webb (1975).) recommends that 

a two-step analysis be carried out (between-groups, b^, and within-groups, 

b ) . We are in agreement with his general recommendation and below we 
3w 

discuss our strategy for carrying them out (see section on Multilevel 
Aiialy ;is) . 

It is important to point out, however, that the guidelines we have 
previously proposed for determining when the between-group coefficient 
yields poor estimates of the individual-level coefficient also identify 
cases where between-groups coefficients yield poor estimates of the pooled 
within-groups coefficient. This occurs because of the inextricable 
linkages among b^, b^ and b^^. After all, covariances and variances at 
the individual level can be decomposed into their within-group and between- 
group components so that guidance regarding the relationship of b^ to b^ 
also provides guidance for the relation of b^ to b^^. 

Multilevel Analysis . In previous papers (Burstein, 1975a, 1976; 
Burstein and Knapp, 1975; Burstein and Smith, 1975), I have suggested 
that we begin to utilize multilevel designs in regression-based 
analyses of school effects.^ In such analyses, each variable is measured 
and analyzed at the lowest level at which the observations on the measure 
tend to vary independently. Cronbach's presentation (1976) urges essen- 
tially the same procedure. 

The two examples of multilevel analyses which are most frequently 
cited are a study by Rock, Baird, and Linn (1972) of the interaction of 
student aptitude and college characteristics and Keesling and Wiley's 
(1974) reanalysis of a subset of the Coleman data. The basic analysis 
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steps in each method are discussed below in the context of a two-level 
school effects problem. 

(A) Rock-Baird and Linn (1972) — 

(1) Calculate within-class regression of outcome (0) on 
input (I), 

(2) cluster classrooms on the basis of their parameters 
(cXf 3, plus mean predictor score for the class) of 
within-class regressions, 

(3) generate discriminant functions to test for statistical 
distinctions among the clusters of classes, and 

(4) identify classroom-level variables that discriminate 
among the clusters at the classroom-level. 

(B) Keesling and Wiley — 

(1) Perform within-class regressions of 0 on I (they used 
the common pooled within-class coefficient in this step, 
presumably for simplicity), 

(2) aggregate pupil's predicted scores to the class level (0), 
and 

(3) in a between-class analysis, regress 0 on 0 and class- 
level input variables. 

From my perspective, each method has certain merits and certain draw- 
backs. The Keesling and Wiley approach provides effect perameters more 
nenrTy mirroring the structural form of school effects than the Rock, 
Baird and Linn approach or the usual single-level analysis models. Yet, 
the appropriate algorithm for actually performing the analysis suggested 
by Keesling and Wiley is unclear. Should pooled within-class coefficients 
be used in step 1 or are the individual within-class coefficients more 
appropriate? 

13 



-12- 



How can we best aggregate predicted outcomes for entry into the between- 
class analyses: Does this approach provide adequate adjustment? 

The above questions about the Keesling Wiley Approach are 
important, but my most serious concern is that their approach fails to 
adequately reflect the effects of between-class differences in slopes. 
For example, Figure 1 depicts outcome-on- input regressions for hypothetical 
classrooms. I would expect Keesling and Wiley's strategy to be able to 
distinguish among the performances in classrooms (A) through (C) (and (F) 
and (E),for that matter) quite easily and be able to separate the effects 
in (A)-(C) from those in (D) , (E) and (F) . But is their technique sensi- 
tive to the case where the slopes are quite different with common means on 
the outcome and input variables (comparing (D) with (E))? If not, some 
Improvements are needed in the Keesling and Wiley approach. 

Theoretically, the approach suggested by Rock, Baird and Linn should 
place classrooms (D) and (E) , and perhaps all classrooms in Figure 1, into 
separate clusters. In practice, I am not so sure that this will be the 
case. Currently available clustering algorithms like those due to Ward 
(1963) and to Johnson (1967) are dependent on the choice of characteristics 
on which to base the formation of clusters. Also, there would likely be 
stability problems in some clusterings that make the discrimination among 
clusters all the more difficult. 

Moreover, treating the resulting clusters as groups in a discriminant 
analysis, as Rock Baird, and Linn do, discards any metric differences 
existing among the clusters and thereby eliminates the possiblity of 
describing school effects in structural terms. (I raised the same concerns 
about the analysis reported by ETS from Phase II of the Beginning Teacher 
Evaluation Study (McDonald, et al., 1975). The use of discriminant groups 
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OUTCOME 




INPUT > 



Figure 1. Regression of Outcome on Input for Different Hypothetical 
Classrooms. 
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results in some cost in generalizability of findings that should be avoided. 

The problem then is that though we have candidates for performing our 
multilevel analysis, each has drawbacks. I think that it would be worth- 
while to explore the following alternative: 

(C) An Alternative Multilevel Analysis Strategy. ~ 

(1) Perform within-class regressions (not pooled) of outcomes 
on input, and 

(2) use the parameters (a, g, perhaps SEE) from the v/ithin- 
class regressions as "outcomes" in a between-class 
analysis. 

This alternative strategy combines certain features of approaches by 
Keesling and Wiley and by Rock, Baird and Linn. The techniques should be 
able to treat the classroom-; ' \ Lcted in Figure 1 as "different" and provide 
effect estimates in structural terjns. In fact, using the within-class 
parameter estimates at outcomes should lead to more sensitive interpreta- 
tions of effects and clearer policy implications from findings. For if 
one needs to decide whether classrooms have behaved in a compensatory 
fashion and, if so, at what levels of input, the proposed strategy has 
merit. 

Necessary Caveats 
It is perhaps appropriate to end the paper with some caveats about 
what I feel are potentially the best analytical methods for handling 
hierarchical data. The proposed alternative strategy for multilevel 
analysis has never been tried out in the form described here to my know- 
ledge^, much less in comparison with other approaches. Furthermore, we 
have considered only analysis procedures in the two-level case when it is 
obvious that a great deal of educational effects research involves at 
least a third level (the school). 

16 



-15- 



The main point is hat we have to move in this direction if we have 
any hope of avoiding the painful, tedious, unparsimonious alternative of 
looking at effects one class (or maybe even one person) at a time. If 
the current rate of improvement in dealing with units of analysis problems 
continues, I should be able to close on a more optimistic note this time 
next year. 
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FOOTNOTES 



The exchange among Bloom, Gagnd and Wiley as recorded in Wittrock and 
Wiley (1970) also predates our work, but addresses the problem from a 
perspective not reflected in the work from the other social sciences. 

^It seems to me that the differences in the independence of treatment 
and response for between-class and within-class allocation to treatment 
or control is a matter of degree rather than existance (of independence), 

•and for the moment, we will treat both experimental setups in the same 
fashion* 

*I will use b^, and e in all subsequent models to represent the 

same parameters and variables at the individual level though their 
values might change, b^, h^^ and e represent their corresponding 
group value, A w subscript will be used to denote within-class 
coefficient, e.g., b^^. 

^The same analysis carried out at two or more levels does not qualify 
as "multilevel analysis" by the present definition. Thus the analyses 
of covariance at the student, claps, school and sponsor levels in the 
Project Follow Through studies (Abt Associates, 1973, Emrick, Sorensen 
and Stearns, 1973; see Haney (1974) for discussion.) are not considered 
to be multilevel. 

^Apparently, Baker and Snow (1972) explore a related procedure to assess 
teacher differences in student aptitude-achievement relationships • 
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APPENDIX A: Comparison of the Wer :s-Linn Approach with Hannan-Burstein 
and Fcige-Watts Apprc aches for Assessing the Difference 
between Grouped and mgrouped Estimators of Regression 
Coefficients. 

In my 1975 AERA presentatioK (Burstein, 1975a), two techniques 
(Feige and Watts, 1972; Hannan ; nd Burstein, 1974) for assessing differences 
between grouped and ungrouped oefficients in the single-regressor case were 
discussed.. These techniques c- n be used to determine whether observations 
grouped according to a specific variable Z (called "A" in Hannan and 
Burstein and expressed in t':ie grouping matrix G in Feige and Watts) yield 
accurate estimates of the corresponding ungrouped coefficients. 

Developments o> dr the past year suggest the need for augumentation 
of the list of mr.thods for assessing differences between grouped and 
ungrouped coefficients. At a conference sponsored by the Measurement and 
Methodology Division of NIE at Annapo^ is, Maryland, educational statisticians 
raised several questions regarding the F-statistics that Feige and Watts 
(1972) use to assess the divergence between grouped and ungrouped estimators. 
Burstein (1975, p. 119) describes the main concern of the statisticians 
about the appropriateness of the Feige-Watts F-test. Below we discuss the 
questions raised and provide an alternative test to the one provided by 
Feige and Watts. 

Recently, Firebaugh (personal communication) suggested that a 
regression model described earlier by Werts and Linn (1971) which includes 
both the regressor, X^^ , and the "compositional effect", X.j, (the mean on 
the regressor for the jth group) is more suitable for assessing the 
differences of grouped and ungrouped regression coefficients. Preliminary 
findings from our empirical analyses indicate that the Werts-Linn model has 
potential in the single-regressor case. As our examples demonstrate, the 
bias from grouping can be more accurately predicted from their model 
(Uerts and Linn, 1971) than from the "Structural-Equations" model 
(Hannan and Burstein, 1974) recommended by Hannan and Burstein under a 
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variety of conditions. Further comparison of W-L with the Feige-Watts (F-W) 
F-statistic yielded the same incongruities evidenced in the comparison of 
H-B with F-W. Below we first outline the W-L approach and restate the 
equations for the structural equations (H-B) and for the original and alter- 
native Feige-Watts models. Data from a large midwestern university are used 
in an empirical illustration of the three models. Finally, we point out some 
potential advantages and disadvantages of each approach and offer directions 
for future comparative research. 

It is important to remember that we are discussing the comparative 
merits of estimating 3 from the ungrouped model 

(A.l) = a + B^^X^j + 

from grouped data. The fact that another model may be appropriate (as when 
the single regression model is misspecif ied) is not considered here though 
it is addressed in Hannan and Burstein, in Firebaugh, in Feige and Watts and 
elsewhere. ^ 

ft 

Model Discription 

Werts and Linn . The original basis for the model suggested by Firebaugh 
is a ETS technical report on regression analysis for compositional effects by 
Werts and Linn (1969) and their subsequent paper on making inferences within 
the Analysis of Covariance model (Werts and Linn, 1971). The model Werts 
and Linn discuss (1971, pp. 407-08) is 

(Note: We use where they use and u^^ where they use e^^^^) 
(A. 2) = + B^X^j + 

where « the Y- intercept of the Y-on-X regression for group j(=Yj - ^^j)* 

B = pooled within-group regression slope 
and ^ 

"ij " usual error term independent of and X^^^ • 

When the standard ANCOVA model is applied to the analysis of composi- 
tional effects, the equation is written as (our notation) 
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(A.3) - a' + e^^.j^X^j + B^jj.^X.j + 

where By - = the pooled within-group regression coefficient and 

B - = difference of the pooled within-group coefficient from the 
YX^X = 

between-group regression coefficient i^^x ~ ^w^ ' 
Also, according to Werts and Linn (1971, pp. 414), 

^YX-X " ^ZX • 

B ~ is called the "compositional" effect and it represents the 
YX»X 

effects of group composition after holding constant individual influences on 
outcome. Werts and Linn point out that (a) the analysis of "compositional" 
effects corresponds to the ANCOVA model in which treatments are not indepen- 
dent of the covariate, (b) the slope ^^X-X ^"^^ZX^ represents the net influence 
of composition and (c) the "compositional" effect is part of the "treatment" 
effect in the ANCOVA model. 

Though these earlier treatments do not make the point explicit, 
subsequent communication with Firebaugh and with Linn indicate that 3yj^.x ~ ^ 
in equation A.3 is a necessary and sufficient condition for the estimator of 
B^- from the model 

(A.4) Y.J = a-- + e^-X.j + U.J 

to be a consistent estimator of from equation A.l. That is, the grouped 
estimator, B^- of &~ will be an inconsistent estimator of B^^ (^^^ ungrouped 
parameter) only when ^yx^X ^ ° * ^^^^ occurs whenever there is a 

"compositional" effect. 

So one need only examine the estimator of &yx*X analysis to 

assess the consistency of the estimation from grouped observations. Note 
also that the metric of grouping variable is irrelevant since the analytical 
model for assessing differences (Equation A.3) uses the individual obser- 
vations and the group mean for the regressor in its regressor set. 

Hannan and Burstein . Hannan and Burstein (1974) dealt primarily with 
Q the case of an ordered grouping characteristic.''" They proposed that the 
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grouping variable Z be incorporated in A.l and its structural relations to 
X and Y be examined by estimating the parameters from the model: 



A.5 = ° + ^x-Z^ij + ^Z-X^ij ■'^ij ' 

where = Z ^ (i = 1. . . , n, persons in groups j = 1, 
ij • J J 

One version of their bias formula is 



, m, respectively) 



A.6 



e = 



YX 



^YZ-X^XZ 



"2 

2 

°x 



from which they deduce that there is no aggregation bias when any of the 
following condidtions hold: 

(i) Z has no effect on Y net of X: ^^.^X ^ ° ' 

(ii) Z has no effect on X: P^Z* = 0 • 

(iii) the ratio of the variances of Z and X between groups equals 
the ratio of their total variances. 

Since Z. = Z , = a|, and condition (iii) becomes 
iJ 'J ^ ^ 

(iii')a^=a|. 



and thus (A.6) can also be written as 
A.6' 



Q o ^2 
*^YZ-X XZ Z 



2 2 

°x - °x 



2 2 
°X°X 



[Note: If = 0 or a~ « 0, the bias is indeterminant . ] 
X X 

In the context of the present question, we can infer from the Hannan- 
Burstein appraoch that the difference between grouped and ungrouped 
coefficients (&^- - &^^) is a function of conditions (i)-(iii'). Several 
years of experience with models of the type described here indicate that the 
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relation of the grouping variable, Z, to outcome Y after fixing the 
regressor, X, (I.e., By7.x^ crucial determinant of the divergence 



It Is also pertinent to note that If conditions (1)— (111') are not 

satisfied, the Initial model (A.l) Is mlsspeclfled due to a correlation 

between Its regressor and an omitted regressor which Is Incorrectly 

Incorporated In the disturbance. Thus, the reason that g— diverges 

from 3™ ^^^^ ^vY Inappropriate In the first place. 

YX X A 

Felge and Watts . The details of the Feige-Watts technique can be 
found In the original source (1972) and In later discussions by Bursteln 
(1975a, 1975b). Felge and Watts developed a measure of the divergence 
between estimators of grouped and ungrouped coefficients, § and § , In the 
multivariate case. They attributed the divergence to three sources — (1) 
specification bias, (11) bias Introduced by grouping that Is not Indepen- 
dent of the disturbances from the structural model and (111) sampling 

ft 

error Induced by the loss of Information through grouping. 

For the sake of comparison, we will provide below a slngle-regressor 
version of their "F-Statlstlc" for divergence. First, however, we descrlb 
the components of their statistic as It was developed so that our recom- 
mended alterations can be. better understood. 

The Felge-Watts F-statlstlc Is predicted on the following development 
(1) Under the null hypothesis that the grouped and ungrouped 
coefficients are the same (3 = B) , the divergence between 
grouped and ungrouped estimators (B and b, respectively) has 
a zero mean 



of grouped and ungrouped regression coefficients. 



(A « b - B « 0) 



and a varlance-covarlance matrix 
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where X'X and X'X are the betwecn-groups and total matrices of sum 
of squares and cross-products for the regressor and is the 
• variance of the disturbance. 

(2) Let e = Y - X B so the e'e is the sum of squared residuals from 
the between-groups regression. 

(3) According to Feige and Watts, 

= AM(X^X)"^ - (X'X)"^]"^ 



o2 

U 



and 



"2 - -I 



u 



are distributed as with k and m-k degrees of freedom, res- 
pectively, where m = number of groups and k is the number of 
regressors. 

(A) Assuming correct specification and independence of X and u, 
Feige and Watts claim that 

^^•^^ ^ " Q2/(m-k) 

is distributed as an F-statistic with k and m-k degrees of 
freedom. Values of their F-statistic beyond the critical 
va.Tues of the F-distribution indicate differences between 
estimators that cannot be attributed to sampling error. 
It can be shown that the single-regressor analogue of the equation (A. 7) 
can be written as 



(A. 8) F « ^^YX Nx^^[sS(X) " SS(X)T 



SS(res) /m-1 
where SS(X) « between-^roup sum of squares 

SS(res) « sum of squares for residuals from the between-groups regression 
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Thus if F from (A. 8) were significant with 1 and m-1 d.f . , then the grouped 
estimator diverges significantly from the ungrouped estimator. 

Alternative Statistic for Assessing Divergence 
Several questions have been raised with regard to the adequacy of the 
Feige-Watts statistic. Feige and Watts' conclusions about the indepen- 
dence and distributions of and have been challenged and the 
inherent asymmetry in the components of the numerator and denominator Iiave 
been noted (Hubert, Olkin, Rubin, Timm, personal communications). We have 
not yet been able to determine the viability of the above mentioned criticism 
with the exception that Q^^ and can be shown to be indepenG^.*^ . 

However^ there is one other point in the development of the Feige-Watts 
statistic that represents a clear problem. It appears that Feige and Watts 
have chosen an inappropriate denominator for their F-test. In the behavioral 
sciences the traditional form of the F-test for differences in regression 
models takes the form: 

^ (R|-R^)/(cIf^-df^) 
(l-R2)/(N-dfp) 

where 

Rj, « squared multiple correlation for the so-called "full" 
model (the more inclusive model) 
« squared multiple correlation for the "restricted" model 

and 

df_, df„ « degrees of freedom for the full and restricted models, 
F R 

respectively. 

There is no recognizable standard for interpreting the cuparison of 
individual-level and aggregate regression models in this fashion. Intui- 
tively, however, it is appealing to associate the individual-level model 
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with the "full" model above and the aggregate with the "restricted". If 
this interpretation is defensible, then the residual sum of squares from 
the individual-level regression fe'e, where e = Y - Xb) would seem to be 
more appropriate than Feigc and Watts' choice for the denominator. Thus we 
propose the following alternative to Feige and Watts 's F-Test: 



(A. 9) F^ 



^Nx'^YX^^ 



n-1 

1 



ss(x) ss(x) 



^ SS(res)/N-l 



where SS(res) « sum of squared residuals from the individual- level 
regression models. 

In the remainder of the appendix, we will add subscripts to the F-test 
to designate the Feige and Watts version (Fp^) and our suggested alterna- 
tive (Fg). 
Empirical Example . 

Table A.l contains a summary of the alternative models and procedures 
for assessing differences described above. To simplify matters even further 
for our illustration, we standardized all observations before grouping. 
This also allows us to further simplify the formulas for differences and 
F-test s. After standardization, the following hold: 
o2 = a2 = o| = a| = 1 

^YX " ^YX' ^YZ " ^YZ' ^XZ " "xZ 
SS(X) = N-1 

The two difference formulas become: 

(A. 10) = hx'X (Firebaugh; Werts and Linn). 
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(A.ll) 6^3= KzJxzr^j • 

In the alternate form of the F-test^ ss\x) replaced by and 
SS(res)/N-l in (A. 9) becomes l-R^^ where R^^ denote the squared multiple 
correlation from the individual-level regression analysis. 

Our data set contains 2676 observations from entering freshmen at a 
large midwestern university (see Burstein, 1974, 1975a; 1975b for further 
details about the data set) on a variety of variables. Here we estimate 
the standardized coefficients from the regression of academic self-appraisal 
(SRAA) on Achievement (ACH) and Achievement (ACH) on Aptitude (SAT). 

The ungrouped equations are: 

SRAA = (.529) ACH [SE(bYx) = .032] 

and 

ACH = (.829) SAT [SE(bY^) = .0105] 

Tables A. 2 through A. 4 provide the parameter estimates from the 
Hannan-Bur stein and the Werts-Linn approaches, the resulting expected 
differences and F-statistic calculated according to the Feige-Watts and al- 
ternative models for the SRAA-on-ACH regression. Tables A. 5 and A. 7 provide 
the same information for the ACH-on-SAT regression. 

Results from the Empirical Analysis 

(1) The Werts-Linn approach predicts differences more accurately than 
the Hannan-Burstein approach in 14 of 20 cases, substantially so 
(better by .05) in 6 cases. 

(2) The Hannan-Burstein approach provides more accurate prediction in 
over 5 cases, in one case by more than .05. 

(3) The Werts-Linn approach performs poorest for the approximation to 
random grouping (IDl), This is to be expected since the W-L model 
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capitalizes on any linear relation between group membership and the 
dependent variable. 

(4) In general, however, W-L and H-B procedures identify the same 
grouping variables as yielding small differences and large differences. 

(5) The grouping methods that were expected to yield small differences 
between grouped and ungrouped coefficients had small F-statistics by 
both Feige-Watts and Burstein tests. 

(6) With the exception of grouping by IDl (supposedly random grouping) , 
large F-statistics coincided with large expected differences in every 
case for the alternative F-test while the Feige-Watts F-tests were 
not significant under conditions of large expected difference for 
two grouping methods. 

Conclusions 

There are several standards by which we can judge the relative merits 
of the techniques described above. The questions we might ask are: 

1. How accurate are the predictions of descrepancy between grouped 
and ungrouped coefficients generated by the technique? 

2. How easy is it to use each technique? 

3. How adaptable is the technique to the nominal grouping variables 
we face most often .in educational research? 

A. What is lost or gained for each technique when we move to more 
complex models involving multiple regressors? 

5. If, as is often the case, it is impossible to reconstruct 

relations at the individual level, what happens to the utility 
of the proposed technique? 

The complications alluded to in question 5 are of a different order of 
magnitude than those in the other questions and this question will be 
considered separately. We discuss the comparative advantages and disadvan- 
tages of the WL and HB models first and then talk about the relative merits 
of the alternative F-tests and their utility in relation to the WL and HB 
methods. 31 
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Comparing W-L and H-B » Though both the Werts-Linn and Hannan-Burstein 
approaches identify essentially the same "good" and "poor" grouping methods, 
the Werts-Linn approach yields slightly more accurate predictions in the 
single-regressor case and their procedure for estimating differences does 
not have to be altered when the grouping variable is nominal (such as 
school) . In the full information situation (both individual and grouped 
data accessible), it is equally easy to classify grouping variables as good 
or poor using either approach. 

The primary disadvantage of the Werts-Linn approach is that there has 
to be one variable consisting of group means associated with each regressor 
in the model, which quickly becomes tideous when there are multiple regres- 
sors. When the grouping variable is ordered, this presents less of a 
problem in the structural equations approach advocated by Hannan and 
Burstein as only a single grouping variable is entered in the analysis. 

There is as of yet no consensus regarding the best method of modeling 
the effects of a nominal grouping variable (Burstein (197A, 1975b) 
discusses some alternative strategies.), especially in the multiple- 
regressor case. With both approaches, we would need to generate a single 
variable (or some small set of variables) with metric properties that 
distinguish among the groups. Otherwise, the modeling of the multivariate 
case will be too cumbersome. 

Alternative F-Tests . The utility of the alternative F-tests is not 
affected by either having a nominal grouping variable or having multiple- 
regressors. Their chief problem is that, in the final analysis, the 
F-tests are global tests and therefore are insensitive to differential 
fitting of regression parameters in the multiple-^regression case. There is 
evidence that grouping differentially affects the estimation of regression 
coefficients. For example, grouping on one regressor yields a more con- 
sistent estimate of its own parameter than of the other parameters in 
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multiple-regression models (Burstein, 1975b; Burstein and Hannan, 1975). 
If this is the case, we would prefer to use a technique that will predict 
where the differences are largest and smallest, and both the Werts-Linn 
and Hannan-Burstein approaches are better suited for this task. 

One further complication with using the F-tests is that of ease of 
calculation in the multiple-regressor case. We (Burstein and Hannan, 
1975) are in the process of modifying the Feige-Watts software so that the 
F-statistics can be generated for a variety of data sets. 

If the present examples are any indication, the F-statistics generated 
by our alternative test more closely approximate the results from other 
approaches than the Feige-Watts statistics do. Furthermore, the alterna- 
tive test is somewhat easier to calculate (constant denominator over a 
variety of different grouping methods). However, more empirical work, and 
perhaps computer simulation should be carried out before making any final 

judgment in the comparison of alternative F-tests. 

ft 

Utility in the Limited Information Case . Often in educational research, 
we analyze grouped data simply because the data on individuals are unavail- 
able or unobtainable in disaggregated form. For example, schools often 
report only school-mean test scores and demographic characteristics to the 
public, and in many cases, fail to retain individual information. 

This limited information situation is troublesome for all the methods 
discussed above. One needs to be able to estimate &yx*X ^ 
Werts-Linn approach and neither a^^ nor a^, which enter the calculations, is 
ascertainable from grouped data alone. The same problem exists in applying 
the Hannan-Burstein a ach as o^^ is necessary for calculating S^^.y 
also enters the expected difference formula. 

We (Burstein, 1974, 1975a, 1975b; Burstein and Hannan, 1975) have 
explored the possibility of using an approximation for the expected 
difference formula which does not require the investigator to know a^^. 
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Our results to date arc mixed, and we are not yet prepared to offer firm 
guidelines on how to proceed with one exception. If the Rrouping method 
has a stronp,cr relationship to the outcome variable than to the repressor 
(p > p or for the Werts-Linn approach, > E^) » large differences 
between grouped and ungrouped coefficients are inevitable . To some degree 
the reverse is true — small differences are associated with cases where 
^XZ ^ ^YZ — especially when the difference is magnitude between p^^ 
p^2 is large. 

We experience even greater difficulties in applying either F-test in 
the limited information case. Estimates of individual-level regression 
coefficients and the covariance matrix for the regressors enter the 
calculation of the tests and neither is obtainable in the "grouped data 
only" situation. The limited usefulness of these techniques under the 
present circumstances is not surprising since Feige and Watts first pro- 
posed their techniques for application when individual-level data is avail- 
able but must remain confidential (cf. 1972). Furthermore, economists in 
general have focused on applications where the existence of individual- 
level data is not a problem (A notable exception is Haitovsky (1966; 1968).) 

Where do the caveats implied above leave us? Well, unless viable 
modifications of the techniques described above can be generated that are 
less sensitive to problems of limited information, we are left with sound 
alternative mathematical models with limited utility for addressing the 
practical problems encountered in educational research. 

Perhaps this sobering analysis is for the best. Perhaps, educational 
record keeping and data bases will become more informative if the persons 
charged with the responsibility of collecting educational data are made 
aware of the methodological morass that results from failing to keep track 
of data at the individual level. For, whether one is interested in purely 
academic or in policy questions, there is no viable substitute for the 
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following guides for collecting and maintaining educational data (Bursteln, 
1975a, 1976; Bursteln and Knapp, 1975; Bursteln and Smith, 1975): 
1,. Measure all variables at their lowest possible level. 

2. Data from Individual students should be matched with data 
from their teachers/classrooms and characteristics of 
their school setting. 

3. Keep track of all information at its lowest possible level 
with retrieval capabilities at multiple levels. 

Anything less can remove the possibility of applying the "appropriate" 

analysis procedure to answer the questions that the policy maker and/or 

researcher wants answered. 
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Table A.l. Alternative models for assessing differences between grouped 
and ungrouped regression coefficients — single-regressor case. 



PURPOSE: To assess differences between B~ (between group coefficient) 

and b (unbiased estimate of g from individual-level data) 
YX * 



BASIC MODELS: 



Werts and Linn; Firebaugh 



with e^x-x'^zx 



from Z^.^C+e^xXj+q^ 



Hannan and Burstein 
(Structural Equations) 



with Z^j=Z.j 



PREDICTED DIFFERENCE: 
Werts and Linn 
DIFFERENCE = §wL=^yX*X^''""^P 



where 



_2 „ SS(X) 
^X " SS(X) 



F-STATISTICS 

Feige-Watts 



-1 



F = 



^Nx''^YX^TsS(X))"(sS(X)l 
SS (res) /m-1 

with 1 and m-1 d.f* 



where SS(res)=sum of squares for 
residuals from the between- 
groups regression 

in=»number of groups formed. 



Hannan and Burstein 



DIFFERENCE = 

°X 

°x-°x 

5?a? 



^YZ-X'^XZ Z 



X X 



since 



a2 a2 



Burstein 



p ^ Kx^^Yxl^fssCX) " SS(X)l 
SS(res)N-l 

with 1 and N-1 d.f. 

where SS(res)»sum of squares for 
residuals from the individual- 
level regression 

N=total number of persons. 
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Table A. 2. Estimates of parameters relating ACH(X) and SRAA(Y) to possible 
grouping variables (Z)^. 









Parameter Estimates 




Variable 
Name 


Group Size 
(m) 






^XZ 


^YZ 




IDl 


10 


.528 
(.0164) 


-.011 
(.0164) 


-.042 
(.0193) 


-.033 
(.0193) 


.078 


SAT2 


13 


.194 
(.0282) 


.406 
(.0282) 


.827 
(.0109) 


.566 
(.0160) 


.831 


PARING 


10 


.527 
(.0164) 


.028 
(.0164) 


.070 
(.0193) 


.064 
(.0193) 


.122 


ACH2 


10 


.460 
(.0896) 


.070 
(.0896) 


.983 
(.0035) 


.522 
(.0165) 


.984 


POPED 


6 


.519 
(.0165) 


.073 
(.0165) 


.139 
(.0192) 


.145 
(.0191) 


.150 


NOBOOK 


5 


.511 
(.0164) 


.122 
,j(.0164) 


.146 
(.0191) 


.196 
(.0190) 


.148 


HSPHYS 


5 


.515 
(.0173) 


.046 
(.0173) 


.318 
(.0183) 


.209 
(.0189) 


.365 


HSMATH 


5 


.561 
(.0187) 


-.066 
(.0187) 


.479 
(.0170) 


.202 
(.0189) 


.489 


SRAA2 


5 


.139 
(.0099) 


.819 
(.0099) 


.476 
(.0170) 


.885 
(.0090) 


.481 


PARASP 


5 


.520 
(.0162) 


.138 
(.0162) 


.066 
(.0193) 


.172 
(.0190) 


.077 



^All variables have been standardized prior to grouping so that 
" " ''Z ' ^' ^XZ ' ^XZ' ^""^ ^YZ ^YZ " 



^Numbers In parentheses are standard errors of regression coefficients. 
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Table A. 3. Estimates of parameters relating ACH(=X) and SRAAC^'Y) to group 
means on the regressor (=X) for selected grouping variables^. 



Variable 
Name 


Group Size 
(m) 


Parameter Estimates 




"^YX'X 


AA 


Pvv 

XA 


A 


IDl 


10 


•531 ^ 
(.0165) 


.047 
(.2085) 


-.387 
(.2443) 


-.159 
(.2454) 


.078 


SAT2 


13 


.220 
(.0286) 


.448 
(.0342) 


.992 
(.0129) 


.666 
(.0193) 


.838 


PARING 


10 


.530 
(.0166) 


.028 
(.1359) 


1.000 
(.1570) 


.558 
(.1585) 


.122 


ACH2 


10 


.522 
(.0904) 


.009 
(.0920) 


1.000 
(.0036) 


.531 
(.0169) 


.984 


POPED 


6 


.522 
(.0166) 


.388 
(.1109) 


1.000 
(.1275) 


.911 
(.1283) 


.150 


NOBOOK 


5 


.513 
(.0165) 


.820 
(.1120) 


1.000 
(.1297) 


1.333 
(.1292) 


.148 


noirnx o 


5 

ml 


.525 
(.0177) 


.047 
(.0486) 


1.000 
(.0494) 


.572 
(.0522) 


.365 


HSMATH 


5 


.567 
(.0188) 


-.153 
(.0386) 


1.000 
(.0345) 


.414 
(.0389) 


.489 


SRAA2 


5 


.134 
(.0099) 


1.719 
(.0206) 


1.000 
(.0353) 


1.853 
(.0187) 


.481 


PARASP 


5 


»522 
(.0164) 


1.425 
(.2126) 


1.000 
(.2500) 


1.947 
(.2489) 


.on 



^X and Y have been standardized so that 0^ = 0„ » 1. 



E(3j^) « 1 for all regressions. 

^Numbers in parenthesis are the standard errors of the regression 
coefficients. 
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Table A. 4. Assessment of differences between grouped and ungrouped coefficients 
ACH — A comparison of Werts-Linu with Hannan-Bursteln predictions 
and alternative F-tests,^ 



Grouping ^ 
Variables 




^ YX'^ 


Observed 
Difference 

^YX^YX 


Werts- 
Linn 
Predicted ^ 
Difference 


Hannon- 
Burstein 
Predicted ^ 
Difference 


Feige- 

Watts J 
„ d 
F-test 


Burstein 
Alternative 
F-testd 


ACH2 


.531 


.0615 


.002 


.000 


.002 


.049 


.667 


PARING 


.558 


.1314 


.029 


.027 


.129 


.051 


.049 


HSPIIYS 


.571 


.0915 


.043 


.041 


.095 


.252 


1.045 


IDl 


.442 


.1831 


-.087 


.046 


.075 


.228 


.173 


HSMATH 


.414 


.0248 


-.115 


-.116 


.100 


27.82** 


15.39*** 


SAT2 


.671 


.0670 


.142 


.133 


.150 


14.52** 


166.67*** 


POPED 


.911 


.1626 


.382 


.380 


.440 


5.64 


12.50*** 


NOBOOK 


1.334 


.1133 


.805 


•.802 


.800 


51.76** 


195.98*** 


SRAA2 


1.853 


.0631 


1.324 


1.321 


1.295 


571.66** 


1962.79*** 


PARASP 


1.946 


.7339 


1.417 


1.417 


1.519 


3.75 


44.73*** 



^Estimates from ungrouped data: b = .529; SE(b ) = .0032. 



Ordered on the basis of size of observed difference 

^Standard errors of between-group coefficients do not Include component for bias In 
estimation. 

^See Table A.l for appropriate formulas. 
**Exceeds the 95 percent critical value for F with 1 and m-1 d.f . 
***Exceeds the 95 percent cricital value for F with 1 and N-1 d.f. 
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Table A. 5. Estimates of parameters relating SAT(X) and ACH(Y) to possible 
grouping variables (Z)^. 







Parameter Estimates 


Variable Group Size 
Name (m) 


3 


*^YZ'X 


•^xz 


^YZ 


0 

X 


IDl 


10 


.839 
(.0105) 


-.003 
(.0105) 


-.046 
(.0193) 


-.042 
(.0193) 


.069 


SAT2 


13 


.884 
(.0662) 


-.042 
(.0662) 


.987 
(.0031) 


.828 
(.0109) 


.989 


PARING 


10 


.838 
(.0106) 


.006 
(.0106) 


.076 
(.0193) 


.070 
(.0193) 


.146 


ACH2 


10 


.082 
(.0061) 


.916 
(.0061) 


.827 
(.0109) 


.983 
(.0035) 


.835 


POPED 


6 


.838 
(.0106) 


.007 
(.0106) 


.157 
(.0191) 


.139 
(.0192) 


.169 


NOBOOK 


5 


.844 
(.0107) 


-.025 
,X.0107) 


.203 
(.0189) 


.146 
(.0191) 


.204 


HSPHYS 


5 


.811 
(.0107) 


.109 
(.0107) 


.257 
( . Olo/) 


.318 

\ • Ulo J ) 


.294 


HSMATH 


5 


.765 
(.0104) 


.214 
(.0104) • 


.346 
(.0181) 


.480 
(.0170) 


.349 


SRAA2 


5 


.811 
(.0123) 


.054 
(.0123) 


.520 
(.0165) 


.476 
(.0170) 


.531 


PARAS P 


5 


.839 
(.0106) 


-.007 
(.0106) 


.087 
(.0193) 


.066 
(.0193) 


.101 


All variables have bee 


n standardized prior 


to grouping 


so that 








= Pxz> ^"'^ 


^YZ ' ^YZ- 








^Numbers 


in parenthesis 


are the standard errors of the 


regression 





coefficients. 
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Table A. 6. Estimates of parameters relating SAT(=X) and ACH(=Y) to 
group means on the regressor (=X) for selected grouping 
variables^. 







Parameter Estimates 


Grouping 
Variable 


Group Size 
(m) 




a 

^YX-X 


ft 


^XX 


°x 


IDl 


10 


o o n 

. 839 
(.0105)*^ 


o o o 

(.1490) 


— • /± / 

(.2738) 


(.2737) 


• UD 7 


SAT2 


13 


• 875 
(.0663)*^ 


-.037 
(.0671) 


1.000 
(.0031) 


(.01^0) 


QfiQ 


PARING 


10 


.839 
(.0106) 


-.019 
(.0721) 


no/ 

(.1297) 


• oU/ 
(.1302) 


1 /• fl 

• iHO 


AGH2 


10 


.089 
(.0078) 


1.080 
(.0093) 


1.000 
(.0128) 


(:0053) 


. ojD 


POPED 


6 


.838 
(.0107) 


.040 
(.0634) 


1 . 000 
(.1133) 


• o/o 

(.1136) 


1 AQ 

. ±0 7 


NOBOOK 


5 


.844 
(.0107) 


-.125 
'•t.0526) 

• 


1.000 

( .0927) 


.719 
( .0937) 


.204 


HSPHYS 


5 


.801 
(.0107) 


.436 
(.0365) 


1.000 
(.0629) 


1.237 
(.0613) 


.294 


HSMATH 


5 


.778 
(.0103) 


.584 
(.0296) 


.878 
(.0528) 


1.266 
(.0497) 


.349 


SRAA2 


5 


.815 
(.0124) 


.084 
(.0234) 


1.000 
(.0321) 


.899 
(.0321) 


.531 


PARASP 


5 


.840 
(.0106) 


-.095 
(.1051) 


.999 
(.1911) 


.744 
(.1916) 


.109 



and Y have been standardized so that a = = 1. 



E(3 -) = 1 for all regressions. 

^Numbers in parenthesis are the standard errors of the regression 
coefficients. 
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Table A, 7. Assessment of differences between grouped and ungrouped coefficients 
SAT — A comparison of Werts-Linn with Hannan-Burstein predictions 
and alternative F-tests,^ 











Werts- 


Hannan- 












Observed 


Llnn 


Burstein 


Feige- 


Burstein 


Grouping ^ 
Variables 


^YX 


SE(B~)^ 


Difference 


Predicted ^ 
Difference 


Predicted ^ 
Difference 


Watts , 
F-test 


Alternative 
F-test'^ 


SAT2 


.838 


.0190 


-.001 


-.001 


-.001 


.00 


.000 


PARING 


• 817 


.0598 


-.022 


-.018 


.021 


.14 


.0986 


POPED 


• 877 


.0685 


.039 


.039 


.038 


.33 


.399 


SRAA2 


• 899 


.0543 


.060 


.060 


.072 


.18 


1.336 


PARASP 


• 744 


.0903 


-.095 


-.094 


-.059 


1.12 


.837 


NOBOOK 


• 718 


.0372 


-.121 


-.120 


-.174 


11.10** 


5.800*** 


IDl 


1^053 


.2168 


.214 


• -.231 


.029 


2.47 


2.002 


AGH2 


1^168 


.0541 


.329 


.327 


.329 


197.73** 


3647.420*** 


HSPHYS 


1^237 


.0422 


.398 


.398 


.295 


98.74** 


136.820*** 


HSMATH 


1^396 


.0478 


.557 


.513 


.531 


149.28** 


388.010*** 



^Estimates from ungrouped data: h^^ = ^839; SECb^^) = •OlOS 
^Ordered on the basis of size of observed blas^ 

^Standard errors of between group coefficients do not Include component for bias In 
estimation^ 

^See Table A^l for appropriate formulas • 
**Exceeds the 95 percent critical value for F with 1 and m-1 d^f. . 
***Exceeds the 95 percent critical value for F with 1 and N-1 d,f. 
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APPENDIX FOOTNOTES 

Burstein (1974, 1975b) discusses ways of handling nominal characteristics 
in structural equation models. Ironically, scaling by substituting the 
group mean on the regressor is one way of ensuring a sutiable metric 
for Z. 

We have simplified the notation somewhat from earlier presentations to 
avoid the introduction of grouping matrices G and H. 



43 



